A unified evaluation of iterative projection algorithms for phase retrieval 
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Iterative projection algorithms are successfully being used as a substitute of lenses to recombine, 
numerically rather than optically, light scattered by illuminated objects. Images obtained compu- 
tationally allow aberration-free diffraction-limited imaging and the possibility of using radiation for 
which no lenses exist. The challenge of this imaging technique is transferred from the lenses to the 
algorithms. We evaluate these new computational "instruments" developed for the phase-retrieval 
problem, and discuss acceleration strategies. 



Crystallographers routinely image molecular struc- 
tures of several thousand atoms by phasing the diffrac- 
tion pattern of a structure replicated in a periodic sys- 
tem. Likewise, computationally retrieving the phase of a 
diffraction pattern is becoming increasingly successful at 
imaging - with several millions of resolution elements- 
objects as complex as biological cells, nanotubes and 
nanoscale aerogel structures. Diffraction microscopy (the 
imaging of isolated objects by diffraction and computa- 
tional phase retrieval) promises a 3D resolution limited 
only by radiation damage, wavelength, the collected solid 
angle and the number of x-rays or electrons collected. 
This capability provides an extremely valuable tool for 
understanding nanoscience and cellular biology. Recent 
estimates [l| of the dose and flux requirements of x-ray 
diffraction on single objects indicate that attractive res- 
olution values (about 10 nm for life science and 2-4 nm 
for material science) should be possible at a modern syn- 
chrotron. Atomic resolution could be accomplished using 
pulses of x-rays that are shorter than the damage process 
itself [2I, 0] using femtosecond pulses from an x-ray free- 
electron laser Q. Alternatively the radiation damage 
limit could be eliminated by continuously replacing the 
exposed samples, such as laser-aligned molecules [5] with 
identical ones. 

In the fields of electron microscopy [6|] and astronomi- 
cal imaging 0], iterative projection algorithms have been 
used to recover the phase information in a variety of prob- 
lems. The evaluation of the aberrations in the Hubble 
space telescope described by Fienup in [8] remains per- 
haps the most prominent example of successful phase re- 
constructions in the astronomical community. Nugent 
and collaborators applied similar techniques to charac- 
terize x-ray lenses [9|. In electron diffraction microscopy 
B 113 5 Zuo and coworkers imaged a single isolated nan- 
otube at atomic resolution [ijj, Wu et al. imaged defects 
at atomic resolution [12II. 
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An important review, which attempted to integrate the 
approaches of the optical and crystallographic communi- 
ties, appeared in 1990 [1^. The connection was made be- 
tween the "solvent-flattening" ^ "density- modification" 
techniques of crystallography [14] and the compact sup- 
port requirements of the iterative projection algorithms. 
The importance of fine sampling of the intensity of the 
measured diffraction pattern was recognized at an early 
stage [15]. 

The observation by Sayre in 1952 [16] that Bragg 
diffraction undersamples the diffracted intensity pattern 
was important and led to more specific proposals by 
the same author for x-ray diffractive microscopy of non- 
periodic objects [r3, [1^. These ideas, combined with 
the rapid development of computational phase retrieval 
in the wider optics community, especially the "support 
constraint" ^, |7|, |1^, ll^ , enabled the first successful use 
of coherent x-ray diffraction microscopy (CXDM). 

Since the first proof of principle demonstration of 
CXDM by a team at Stony Brook [21], a number of 
groups have been working to bring these possibilities into 
reality. 

Robinson and co-workers at the University of Illinois 
have applied the principles of CXDM to hard x-ray ex- 
periments on microcrystalline particles. Such data have 
been reconstructed tomographically to produce a 3D im- 
age at 80 and more recently 40 nm resolution [22|, [Hi- 
Miao (now at UCLA) and co-workers made consider- 
able progress in pushing the CXDM method at Spring-8 
Japan to higher resolution in 2D (7 nm), higher x-ray 
energies, and to a limited form of 3D [2J]. They have 
also made the first application of CXDM to a biological 
sample [25]. 

A diffraction chamber dedicated to diffraction mi- 
croscopy [26] has been used to image biological cells 
[27I . [28] at the Advanced Light Source in Berkeley [29^ . 
Using the same chamber, a collaboration between Berke- 
ley and Livermore labs and Arizona State University pro- 
duced 3D imaging at 10 x 10 x 40 nm resolution of test 
samples [30] as well as aerogel foams [sH. 

In this article the computational instruments that en- 
abled these and other results are reviewed. Section 
[11 introduces the phase problem and the experimental 
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requirements for diffraction microscopy, Section [III de- 
scribes the concepts of sets of images and their projec- 
tors. In Section [TTll the iterative projection algorithms 
pubhshed in the hterature are summarized and tested 
on simple geometric sets. In Section IIVI the connection 
between projection- and gradient- based methods and re- 
lated acceleration strategies are discussed. 



I. THE PHASE PROBLEM 

When we record the diffraction pattern intensity scat- 
tered by an object, the phase information is missing. 
Apart from normalization factors, an object of density 
p(r), r being the coordinates in the object (or real) 
space, generates a diffraction pattern equal to the mod- 
ulus square of the Fourier transform (FT) p{k): 



i{k) 
m 



\m\' 
pt(fe)p(fe), 



(1) 



where k represent the coordinate in the Fourier (or Re- 
ciprocal) space. The inverse Fourier transform (IFT) 
of the measured intensity / provides the autocorrelation 
p{—r) * p{r) of the object: 



lFT[I{k)] = p{-r) * p{r) . 



(2) 



The phase-retrieval problem consists of solving p in Eq. 
([1]) or p in Eq. ([2]), using some extra prior knowledge. 
In diffraction microscopy, solving such problem is per- 
formed with giga-element large-scale optimization algo- 
rithms, described in the following section. 

Since the intensity represents the FT of the autocorre- 
lation function, and the autocorrelation is twice as large 
as the object, the diffraction pattern intensity should be 
sampled at least twice as finely as the amplitude to cap- 
ture all possible information on the object. Finer sam- 
pling adds a 0-padding region around the recovered au- 
tocorrelation function, which adds no further informa- 
tion (Shannon theorem). Less than critical sampling in 
the Fourier domain causes aliasing in the object space. 
A periodic repetition of the same structure provides a 
stronger signal, enabling the measurement of the diffrac- 
tion pattern before the structure is damaged. However, 
while an isolated object generates a continuous diffrac- 
tion pattern that can be sampled as finely as desired, a 
periodic repetition of the same object generates only a 
subset of the possible diffraction intensities. Crystallog- 
raphy therefore has to deal with an aliased autocorrela- 
tion function, also known as the Patterson function. This 
reduced information can be compensated by other prior 
knowledge, such as the atomic nature of the object being 
imaged, knowledge of a portion of the object, presence of 
heavy atoms, and information obtained with anomalous 
diffraction. Other information includes the presence of a 
solvent in the crystal. By varying the sampling rate of a 
diffraction pattern it was shown til, 112,131 that less than 



critical sampling was sufficient to solve the phase prob- 
lem. This was possible because the number of equations 
(measured intensities in Eq. ([1])) in the two- and three- 
dimensional phase-retrieval problems is larger than the 
number of unknowns (resolution elements in the object). 
The number of unknowns defines the number of indepen- 
dent equations, or the minimum required sampling rate. 
Although no general proof has been provided that lim- 
ited sampling removes only redundant equations, such a 
minimum required sampling rate suggests that when the 
solvent exceeds 50% of the crystal volume, the algorithms 
developed in the optical community, using techniques to 
dynamically refine the solvent regions [34] may be able 
obtain ab-initio structural information from crystals. 

Coherence is required to properly sample the FT of the 
autocorrelation of the object [35]. According to the Schell 
theorem [36|, the autocorrelation of the illuminated ob- 
ject obtained from the recorded intensity is multiplied 
by the complex degree of coherence. The beam needs 
to fully illuminate the isolated object, and the degree of 
coherence must be larger than its autocorrelation. 

Diffraction microscopy solves the phase problem by us- 
ing the knowledge that the object being imaged is iso- 
lated; it is assumed to be outside a region called sup- 
port S: 



p(r) = 0, ifr^S. 



(3) 



This support is equivalent to the solvent in crystallogra- 
phy. Equations ^ and (|3|) can be combined to obtain 
a multidimensional system of quadratic equations in the 
p{r) variables: 



p{r) ex.p{ik • r) 



res 



(4) 



which is a quadratic equation in the p{r) variables with 
coefficients Cr,r'{k) = cos{k • (r — r^)): 



E cr,r'{k)p{r)p*{r') = I{k). 



(5) 



r^r'eS 



Each value of I{k) in reciprocal space defines an ellipsoid 
(Eq. ([5])) in the multidimensional space of the unknowns 
p(r), {r G S}. If the number of independent equations 
equals the number of unknowns, the system has a sin- 
gle solution p{r). The intersection of these ellipsoids 
forms our solution. Unfortunately this system of equa- 
tions is difficult to solve, and has an enormous number 
of local minima. Constant phase factors, inversion with 
respect to the origin (enantiomorphs), and origin shifts 
p(dzr + ro)e*'^° are undetermined and considered equiv- 
alent solutions. The presence of multiple non-equivalent 
solutions in two- and higher- dimensional phase retrieval 
problems is rare [37]; it occurs when the density distribu- 
tion of the object can be described as the convolution of 
two or more non-centrosymmetric distributions. Simple 
homometric structures for which the phase problem is 
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not unique [38^ exist in nature, but such non-uniqueness 
is less likely for more complex structures. 

The presence of noise and limited prior knowledge 
(loose constraints) increases the number of solutions 
within the noise level and constraints. Confidence that 
the recovered image is the correct and unique one can 
be obtained by repeating the phase-retrieval process us- 
ing several random starts. Repeatability of the recovered 
images as a function of resolution measures the effective 
phase-retrieval transfer function [g^, [lOl, which can be 
decomposed in unconstrained amplitudes modes [23] and 
phase aberrations [39]. 

In the early 1980s, the development of iterative al- 
gorithms with feedback by Fienup, produced a remark- 
ably successful optimization method capable of extract- 
ing phase information 0, ElQ.] • The important theoret- 
ical insight that these iterations may be viewed as projec- 
tions in Hilbert space [4l|, [42|] has allowed theoreticians 
to analyse and improve on the basic Fienup algorithm 

0, Si, in, [3. 

These algorithms try to find the intersection between 
two sets, typically the set of all the possible objects with 
a given diffraction pattern (modulus set), and the set of 
all the objects that are constrained within a given area 
or support volume (or outside a solvent region in crystal- 
lography). The search for the intersection is based on the 
information obtained by projecting the current estimate 
on the two sets. An error metric is obtained by evalu- 
ating the distance between the current estimate and a 
given set. The error metric and its gradient are used in 
conjugate-gradient (CG) -based methods such as SPE- 
DEN \47i]. 



II. SETS, PROJECTORS AND METRICS 

An image of a density distribution can be described as 
a sequence of n pixel values. For an image of n pixels, 
there are n coordinates. The magnitude of the density at 
a pixel defines the value of that coordinate. Thus a sin- 
gle vector in this n-dimensional space defines an image. 
For complex images the number of coordinates increases 
by a factor of 2. Axes of the multidimensional space are 
formed by any sequence of n-pixels with all but one pixel 
equal to 0. An example is = (x, 0, 0) in 3-pixel solution 
space. The origin of this space is the image with all the 
pixels equal to 0. The components on these axes form the 
real or object space. The same object can be described in 
terms of any another n-dimensional orthogonal (or lin- 
early independent) bases. Axes can be rotated, shifted, 
inverted and so on, and the proper linear transform must 
be applied to obtain the components in the new basis. 
The basis used to describe the image must have at least 
n components, but more can be used if it helps to describe 
the properties of the algorithm. For example the values 
could be left to have a real and an imaginary component, 
doubling the number of dimensions used to describe the 
object. 



V(r3) 




(a) 



(b) 



FIG. 1: Examples of sets and projectors: (a) Support: The 
axes represent the values on 3 pixels of an image p known to 
be outside the support S. The vertical axis ^(rs) represents 
a pixel outside (rs G S_), while the horizontal plane repre- 
sents pixels inside S. The projection on this set is performed 
simply by setting to all the pixels outside the support, (b) 
Modulus: A pixel (in Fourier space) with a given complex 
value is projected on the closest point on the circle defined by 
the radius m. If there is some uncertainty in the value of the 
radius m =b ^, the circle becomes a band. The circle is a non- 
convex set, since the linear combination between two points 
on the same set pi and p2 does not lie on the set. Also repre- 
sented in the figure is the projection on the real axis (reality 
projection). 



One important basis is the momentum or Fourier 
space. While the vector in the n-dimensional space rep- 
resenting an image is unaltered on transforming from real 
to reciprocal space, its components in the new axes are 
altered (Fourier-transformed). The distance between two 
points in the n-dimensional space is independent of this 
transformation (Parseval theorem). The lengths and the 
angles between vectors will be our guide to describe the 
behavior, convergence and error metrics of these algo- 
rithms. 

We consider two sets, S (support) and M (modulus). 
When the image belongs to both sets simultaneously, we 
have reached a solution. If the properties of the object 
being imaged are known a-priori to be limited in a sup- 
port region, we know that in the n-dimensional space of 
the pixel values, some values must be zero. Images that 
satisfy this rule (Eq. (|3])) form the support constraint 
set. A projection onto this set (Pg) involves setting to 
the components outside the sup port, while leaving the 
rest of the values unchanged (Fig. 1(a)): 



PsP{r) 



p{r) ifr e S 



otherwise, 



(6) 



and its complementary projector Ps_ = I — Pg- 

The values in every pixel in Fourier space can be de- 
scribed using two components, the real and imaginary 
parts, or amplitude and phase, both defining a point in 
a complex plane. In an intensity measurement we obtain 
the amplitude or modulus in every pixel that defines a cir- 
cle in a complex plane. These circles define the modulus 
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FIG. 2: The reflector applies the same step as the projector 
(P - /) twice: Rp = Ip+ 2[P - I]p 



constraint (Fig. |l(b)[ ). When every complex- valued pixel 
lies on the circle defined by the corresponding modulus, 
the image satisfies this constraint and it belongs to the 
modulus set. Segments joining two points on a circle do 
not belong to the circle; therefore the linear combination 
of two images is outside the set: the set is non-convex. 
These sets are problematic because of the presence of 
local minima and undefined projections. 

The projection of a point in each complex plane onto 
the corresponding circle is accomplished by taking the 
point on the circle closest to the current one, setting the 
modulus to the measu red o ne and leaving the 

phase unchanged (Fig. |l(b)[ ): 



Pm~p(k) = Pm\~p(k)\e 



(7) 



where we have defined the reciprocal space representation 
of the projector: 



-J 7T?V 1 



(8) 



and T and represent the forward and inverse Fourier 
transforms respectively. 

This operator is demonstrated to be a projector on the 
non-convex (Fig. |l(b) ) set of the magnitude constraint 
[48*] . The same paper discusses the problems of multi val- 
ued projections for non-convex sets, which do not satisfy 
the requirements for gradient-based minimization algo- 
rithms, and the related non-smoothness of the squared 
set distance metric, which may lead to numerical insta- 
bilities. See also |49|] for a follow-up discussion on the 
non-smooth analysis. 

A projector P is an operator that takes to the clos- 
est point of a set from the current point p. A rep- 
etition of the same projection is equal to one projec- 
tion alone (P^ = P); its eigenvalues must therefore be 
A = 0, 1. Another operator used here is the reflector 
H = /+2[P— J] = 2P—I^ which applies the same step as 
the projector but moves twice as far (Fig[2|). In the case 
of the support constraint, the whole image space can be 
described in terms of the eigenvectors of the correspond- 
ing linear projector. These eigenvectors with eigenvalues 
of 1 (0) are the images with all the pixels equal to 0, 



except for one pixel inside (outside) the support. The 
modulus projector is a non-linear operator: 



Pm{aa) ^ aPm{a), 



(9) 



and it cannot be described in terms of eigenvalues and 
eigenvectors. 

The Euclidean length ||p|| of a vector p is defined as: 



p"^ - P 



(10) 



The sum is extended to the measured portion of the 
diffraction pattern. If part of the reciprocal space is not 
measured, it should not be included in the sum. In fact 
the sum should be weighted with the experimental noise 



(11) 



with o-{k) = oo for values of k not measured. The dis- 
tance from the current point to the set \\Pp — p\\ is the 
basis for our error metric. Typically the errors in real 
(Es) and reciprocal space (sm) are defined in terms of 
their distance to the corresponding sets: 



£s{p) = 
£m{p) = 



\PsP- 
\PmP 



P\\. 



(12) 

or their normalized version Sx{p) = \^\p^J\ \ • Another error 
metric used in the literature is given by the distance be- 
tween the two sets: Ss^mip) = \\PmP — PsPW- The projec- 
tor Pm moves p to the closest minimum of e'^{PmP) = 0, 
providing a simple relation with the gradient Vp6:^(p) 

MM- 

Pr^p = p+ [P^ -I]p = p- iVp£^(p) , (13) 



where \/pe'^{p) is proportional to VpSm{p)' 
^^P^mip) = 2e^{p)Vpe„,{p). 



(14) 



For p{k) = 0, Sm is non different iable, and the projector 
Pm is multivalued [48]. The presence of complex zeros 
{p{k) = 0) is considered of fundamental importance in 
the phase-retrieval problem [50], and the phase vortices 
associated with these zeros cause stagnation in iterative 
algorithms [51]. Several methods have been proposed to 
solve this problem [24, [H, [5l|, [H, [H. Similarly the 
projector Pg minimizes the error e^: 



(15) 



III. ITERATIVE PROJECTION ALGORITHMS 

Several algorithms based on these concepts have now 
been proposed and a visual representation of their be- 
havior is useful to characterize the algorithm in various 
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TABLE I: Summary of various algorithms 



Algorithm 


Iteration p^""^^^ = 


Tim? 




QT? 

or 




HIO 


- p±^m)p^ ^ [r) rfb 


DM 


{I + l3Ps [(1 + 7s)Pm -7s/] 

- pPrrr [(1 + 7m) P. " 7m/]}p<"' 


ASR 




HPR 


\[Rs (J?^+(/3-l)P^) 
+/+(l-/3)P^]p("> 


RAAR 


(RsRrr. +/) + (!- /3)P™] 



situations, in order to help choose the most appropriate 
one for a particular problem. In this section the projec- 
tion algorithms published in the literature are summa- 
rized (see also Table H]) and tested on simple geometrical 
sets. 

The following algorithms require a starting point p^, 
which is generated by assigning a random phase to the 
measured object amplitude (modulus) in the Fourier do- 
main \p{k)\ = m{k) = ^/I{k). The first algorithm called 
error reduction (ER ) (Gerchberg and Saxton [gI. [4iI. IHS] ) 
is simply (Fig. 3(a) ): 



^(n+l) 



P P 



(16) 



and by projecting back and forth between two sets, it 
converges to the local minimum. The name of the algo- 
rithm is due to the steps moving along the gradient of 
the error metric (see Eq. (p!3|) ): 



PsPmP Psp- 



2^ s^n 



(17) 



where Vg = PgV is the component of the gradient in the 
support. Figure [3(a) shows that the step size is far from 
optimum, but that it guarantees linear convergence. A 
line search along this gradient direction would consider- 
ably speed up the convergence to a local minimum and 
will be discussed in Section HVl 

The solvent flipping (SF) algorithm [55[ is obtained 
by replacing the support projector Ps with its reflector 
Rs = 2Ps -I (Fig. [3(b)]) : 



9(^+1) = RsPrr^P^''^ 



(18) 



which multiplies the charge density p outside the support 
by -1. The hybrid input-output (HIO) [7, 20] (Fig. |3(c)D 
is based on non-linear feedback control theory and can 
be expressed as: 



{x) = 



Pmp^''\x) lixeS, 
{I - f3Pm)p'^''^ {x) Otherwise. 



(19) 



Equations (p!3|) and ([15]) can be used to describe the steps 
{Ap = p'^^'^^^ — p'^^^) in terms of the gradients of the 



error metrics. In Section IIVI it will be shown that this 
algorithm seeks the saddle point: 



min max>C(p), 

ps Ps_ 



C{p)= slip) -slip) 



(20) 



by moving in the descent-ascent direction {[—Ps + 
f3Ps\VC) (see Section [IVj for details), rather than in the 
simple error-minimization direction. 

It is often used in conjunction with the ER algorithm, 
alternating several HIO and one or more ER iterations 
(HIO(20)+ER(1) in our case). In particular one or more 
ER steps are used at the end of the iteration. Elser 
pointed out that the iterate p^ can converge to a fixed 
point {p^~^^ = p^), which may differ from the solution p 
{PsP = Pmp = p)' However the solution p can be easily 
obtained from the fixed point: 



PmP"", 

(1 + ^)PsPmP" 



(21) 



where pm and ps should coincide, or else their difference 
can be used as an error metric. See [43] for further details. 

The difference map (DM) is a general set of algorithms 
[43^], which requires 4 projections (two time-consuming 
modulus constraint projections) (Fig. |3(d)[ ): 



^(n+l) _ 



{ / -^(3Ps[{l^-fs)Prr 
- f3Pm[{l^-fm)Ps 



■7m/]}p^"^ (22) 



the solution corresponding to the fixed point is described 
in the same article [43| . We will use in the upcoming tests 
what Elser suggested as the optimum, with 7^ = 
and 7^ 

The averaged successive reflections (ASR) 0] algo- 
rithm is: 



P 



(n+l) 



■l]p 



(n) 



(23) 



The Hybrid Projection Reflection (HPR) [45[ algorithm 
is derived from a relaxation of the ASR: 



P 



(n+l) 



\[Rs{Rrr^^{P-l)Prr^) 
I+(l-/3)P^]p(^). 



(24) 



It is equivalent to HIO if posit ivity (Section [TTT|) is not 
enforced, but it is written in a recursive form, instead of 
a case-by-case form such as Eq. ([T9|) . It is also equivalent 
to the DM algorithm for 7^ = —1, 7^ = Finally the 
relaxed averaged alternating reflectors (RAAR) algorithm 
H: 



0(^+1) - 



[\P{RsRrr^^I)^{l-P)Prr^]p 



in) 



(25) 



For = 1, HIO, HPR, ASR and RAAR coincide. 

The first test is performed on the simplest possible 
case: find the intersection between two lines. Figure 
m shows the behavior of the various algorithms. The 
two sets are represented by a horizontal blue line (sup- 
port) and a tilted black line (modulus). ER simply 
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FIG. 3: Geometric representation of various algorithms us- 
ing a simplified version of the constraint: two lines intersect- 
ing, (a) Error reduction algorithm: we start from a point on 
the modulus constraint by assigning a random phase to the 
diffraction pattern. The projection onto the modulus con- 
straint finds the point on the set which is nearest to the cur- 
rent one. The arrows indicate the gradients of the error met- 
ric, (b) The speed of convergence is increased by replacing 
the projector on the support with the reflector. The algo- 
rithm jumps between the modulus constraint (solid diagonal 
line) and its mirror image with respect to the support con- 
straint (dotted line), (c) Hybrid input-output, see text (Eq. 
([T9|)). The space perpendicular to the support set is repre- 
sented by the vertical dotted line S_. (d) Difference map, see 
text (Eq. ([22))). 



projects back and forth between these two lines, and 
moves along the support line in the direction of the in- 
tersection. SF projects onto the modulus, 'reflects' on 
the support, and moves along the reflection of the mod- 
ulus constraint onto the support. The solvent flipping 
algorithm is slightly faster than ER thanks to the in- 
creased in the angle between projections and reflections. 
HIO and variants (ASR, DM, HPR) move in a spiral 
around the intersection, eventually reaching the intersec- 
tion. For similar f5 RAAR behaves somewhere in between 
ER and HIO with a sharper spiral, reaching the solution 
much earlier. Alternating 20 iterations of HIO and 1 of 
ER (HIO(20)+ER(1)) considerably speeds up the con- 
vergence. 

W hen a gap is introduced between the two lines (Fig. 
|4(b)[ ), so that they do not intersect, HIO and variants 
move away from this local minimum in search of another 
'attractor' or local minimum. This shows how these algo- 



rithms escape from local minima and explore the multi- 
dimensional space for other minima. ER, SF and RAAR 
converge to or near the local minimum. By varying ^ 
RAAR becomes a local minimizer for small /3, and be- 
comes like HIO for :^ 1. ER, SF and HIO+ER converge 
to the local minimum in these tests. 

A more realistic example is shown in Fig. [5l Here 
the circumference of two circles represents a non-convex 
set (modulus constraint), while the support constraint is 
represented by a line. The convex set represents a sim- 
plified modulus constraint in a phase-retrieval problem. 
The advantage of this example is the simplicity in the 
'modulus' projector operator (it projects onto the closest 
circle) . 

We start from a position near the local minimum. ER, 
SF, and HIO+ER fall into this trap (Fig. [5]). HIO and 
variants move away from the local minimum, 'find' the 
other circle, and converge to the center of the circle. 
In the center the projection on the modulus constraint 
becomes 'multivalued', and its distance metric is 'non- 
smooth'. Such a point is unstable, and the algorithms 
start spiralling toward the solution. For = 0.75, RAAR 
does not reach the solution, but converge close to the lo- 
cal minimum. 



Positivity 

The situation changes slightly when we consider the 
positivity constraint. The previous definitions of the al- 
gorithms still apply, just replacing with Ps+: 




if r G 6' & p{r) > , 
otherwise. 



(26) 



The only difference is for HIO which becomes: 



P 



(,+1) ^ f Prr^pM (r) ifreSk Prr^p^^^ (r ) > 0, 

[ (1 - pPm)p^''^ Otherwise. 

(27) 

HIO and variants follow the saddle-point direction, mov- 
ing away from local minima (Fig. [6]), but as they ap- 
proach the solution they react differently to the positiv- 
ity constraint, with HIO "bounching" at the x = axis, 
and ASR/HPR/DM proceeding more smoothly toward 
the solution. 



IV. STEEPEST DESCENT, CONJUGATE 
GRADIENT, AND MIN-MAX ALGORITHMS 

As discussed in Section IIIIl the error reduction (ER) 
algorithm moves in the direction of the steepest descent 
[20|; however the step length is not optimized to reach 
the local minimum in that direction, since it is only one 
component of the full gradient (Fig. 
egy is generally referred to as reducec 
Figure [3 shows the error metric e. 



3(a)). Such strat- 
gradient method, 
as a function of two 
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FIG. 4: The basic features of the iterative projection algorithms can be understood by this simple model of two lines intersecting 
(a). The aim is to find the intersection. The ER algorithm and the solvent flipping algorithms converge in some gradient-type 
fashion (the distance to the two sets never increases), the solvent flip method being slightly faster when the angle between the 
two Unes is small. HIO and variants move following a spiral path. The lagrangian (£ = — sj) is represented in grayscale, 
and the descent-ascent directions ([— Vs, Vs]£) are indicated by arrows. When the two lines do not intersect (b), HIO and 
variants keep moving in the direction of the gap between the two lines, away from the local minimum. ER, SF and RAAR 
converge at (or close to) the local minimum. 




Support 
Modulus 
• start 
-□— ER 

— Solvent Flip 
->— HIO 

-A— Dlff. Map, D-^p^P 

-^ASR 

HPR 
-^HIO^^-hER^ 
-0— RAAR 




FIG. 5: The horizontal line represents a support constraint, 
while the two circles represent a non-convex constraint, i.e. 
the modulus constraint. The gradient-type (ER and SF) algo- 
rithms converge to the local minimum, while HIO and variants 
follow the descent- ascent direction ([— Vs, Vs]£) indicated by 
the arrows. 



FIG. 6: Posit ivity constraint: the support constraint is rep- 
resented by a horizontal line originating from (x > 0). A 
barrier due to the positivity constraint changes the behavior 
of the algorithms, which no longer follow the descent-ascent 
direction. HIO bounces on the x — axis, while the other 
algorithms are smoother. 



unknown pixel values in a simple two-dimensional phase- 
retrieval problem, and the behavior of the ER algorithm 
toward the local minima. 

The simplest acceleration strategy, the steepest de- 
scent method, uses the steepest direction (gradient) and 
performs a line search of the local minimum in the de- 



scent direction: 

min^ el, {p + 5Ap) , (28) 
Ap = -^Vsslip) = -Ps[I -Pm]p. 

where Vg = PgVp is the gradient with respect to ps. 
At a minimum any further movement in the direction of 
the current step increases the error metric; the gradient 
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direction must be perpendicular to the current step. In 
other words the current step and the next step become 
orthogonal: 



85 



(p- 



■6Ap) 




{Ap\P,[I-Pr^ 

(Ap,|[/-Pj 



(p + SAps)), , (29) 



where {x\y)r = 3? {x^ • y)- The line search algorithm can 



use e^, and/or its derivative in Eq. (|29|) . This optimiza- 
tion should be performed in reciprocal space, where the 
modulus projector is a diagonal operator and is fast to 
compute (Eq. ([7j)), while the support projection requires 
two Fourier transforms: 



(30) 



The steepest descent method is known to be ineffi- 
cient in the presence of long narrow valleys, where im- 
posing that successive steps be perpendicular causes the 
algorithm to zig-zag down the valley. This problem 
is solved by the non-linear conjugate gradient method 
[13, m, [Ei, HO, EH, ii, m. instead of moving in the di- 
rection of steepest descent Ap^, we move in the conjugate 
direction Ap^: 



(n) 



in) 
s 

An) 



(n-l) 



if n = 1 , 
otherwise. 



with 75 given by the Polak-Ribiere method [6l|: 

_ (Ap(-)|Ap(-)-Ap(-^))^ 



(—1)112 



(31) 



(32) 



and forced to be positive: 7 = max(7s,0) to improve 
its reliability. The presence of local minima shown in 
the previous chapters, however, will cause stagnation 
of steepest and conjugat e gra dient methods, preventing 
global convergence (Fig. |7(c) ). 



Feedback and the saddle-point problem 

The ability to escape local minima demonstrated by 
input-output feedback-based algorithms (Fig. |7(d)[ ) 
makes them superior to the methods based on simple 
gradient minimization of the error. However, as in the 
ER algorithm, the step length is not optimized, the al- 
gorithm keeps moving in the same direction for several 
steps, and sometimes overshoots. Combining the ideas 
of the conjugate gradient or the steepest descent meth- 
ods and 10 feedback could considerably speed-up conver- 
gence. Given the lagrangian C defined as the difference 
between the two errors: 

^P) = elip) - slip) , (33) 

using equations ([13]) and ([15]) we obtain the gradient: 

V£(p) = 2[Ps-Pm]p- (34) 



The step Ap used in HIO (Eq. [19]) can be expressed in 
terms of this gradient V£: 



Ap 



P' 

= {P,[Pm-I]-PP^Pm}p, 

= {Ps[P^ - Ps] - f3Ps_[Pm - Ps]}p , 

= {-P,+/3PJiV£(p). (35) 

HIO/HPR/ASR algorithms move toward the minimum 
of C in the subspace ps, and the maximum in the sub- 
space Ps , using a reduced gradient optimization strategy, 
where the step is proportional to the gradient but with 
one sign reversal (Eq. ([35]) ). In other words, they seek 
the saddle point: 



min max C{ps -\- Ps) - 

ps Ps_ 



(36) 



Min-max or saddle-point problems arise in fields as var- 
ious as game theory, economics, physics, engineering, and 
primal-dual optimization methods. Function minimiza- 
tion is easier than saddle-point optimization because a 
simple function evaluation can tell us if a new point is 
better than the previous one. The saddle can be higher 
or lower than the current value, although the direction 
toward the saddle is indicated by the two gradient com- 
ponents. One option is to alternate minimization in the 
direction ps and maximization in the direction ps_ of jC. 
Such a strategy is similar to alternating HIO and ER 
algorithms and can be performed using off-the-shelf op- 
timization routines, but it can be slow. Optimization of 
the step length, a multiplicative factor ^, is obtained by 
increasing a until the current and next search directions 
become perpendicular to one another (Fig. |7(e) ): 



{Apm-(3P^Wjr{p + 6Ap))^ = 0, 
{Ap\{P,[Pm-I]-PP^Pm}{p + SAp))^ = 0.(37) 

In analogy to the conjugate gradient method, one could 
substitute the search direction Ap with Ap, as in Eq. 
(|3T]) (Fig. |7(f)[ ). A more robust strategy involves replac- 
ing the one-dimensional search with a two-dimensional 
optimization of the saddle point (Fig. 7(g)[ ): 



min max (a, /3) , 

a (3 

V^(a, (3) = C{p + aAps + f3Aps_) . 



(38) 



Once the 2D min-max problem is solved, the new direc- 
tions can be obtained by following the conjugate gradient 
scheme (Fig. |7(i)[). 



V. CONCLUSIONS 

Lensless imaging owes its success as an effective tool 
to observe nanoscale systems to the advances made in 
phase-retrieval algorithms. The new instruments re- 
placing lenses are the iterative projection algorithms for 
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(a) Reduced Gradient Method (ER) (b) Steepest Descent (c) Conjugate Gradient 




(g)2D saddle optimization (h)2D conjugate saddle optimization (1) (i)2D conjugate saddle optimization (2) 

FIG. 7: A simple 2-D phase-retrieval problem: only two variables (pixel values) are unknown. The solution -the global 
minimum- is the top minimum in the figures. The colormap and contour lines represents the error metric £m(ps), and the 
descent direction is indicated by the arrows. The error reduction algorithm (a) proceeds toward the local minimum without 
optimizing the step length and stagnates at the local minima. The steepest descent method (b) moves toward the local 
minimum with a zig-zag trajectory, while the conjugate gradient method reaches the solution faster (c). The HIO method 
generally converges to the global minimum, however some rare starting points converge to a local minimum (d). The saddle- 
point optimization with optimized step length (Eq. I37|) stagnates in the same local minimum as HIO (e). The conjugate gradient 
version avoids stagnation (f). The saddle point optimization using a two dimensional search of the saddle point reaches the 
global minimum from a larger range of starting points than HIO (g). The conjugate gradient version (h, i) reaches the solution 
faster if the conjugate directions Aps,s are obtained independently from Aps,s (i), rather than their sum Ap = Aps + Aps_. 
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FIG. 8: Test figure used for benchmarking. The object of 128^ 
elements is surrounded by empty space. The whole image has 
256^ elements. The Fourier transform of this image provides 
the data set, and its area defines the support 



phase retrieval. These algorithms can be grouped in two 
categories: (1) local minimizers such as ER, SF, steep- 
est descent and conjugate-gradient methods, with Sol- 
vent Flip having some moderate ability to escape local 
minima [55]. (2) more global minimizers such as HIO, 
DM, ASR, HPR which use a feedback to reach the so- 
lution. RAAR and ER+HIO fall somewhere in between 
the two categories, depending on an adjustable param- 
eter. A simple benchmark is shown for comparison in 
Fig. [8] and summarized in Table [III The test consisted 
in solving a phase-retrieval problem without assuming 
positivity (nor reality) of the object, and the support 
region was slightly larger than the object, and was re- 
peated 100 times for each algorithm. Many algorithms 
surprisingly failed, and only the ones shown in Fig. [9] suc- 
ceeded. HIO appears to be the most effective algorithm, 
and it is significantly improved in terms of speed and re- 
liability when the two-dimensional step size optimization 
(S02D), as described in Eq. ([38|) is applied. Further 
improvements in reliability are achieved by performing 
a saddle-point optimization in a 4-dimensional space of 
two successive steps (S04D). Minimization algorithms, 
although not very powerful at solving the phase prob- 
lem, can be used to polish-up a solution, improving the 
values of the error metric considerably. The algorithms 
described here use as prior knowledge the support re- 
gion. Algorithms that use a simple threshold to replace 
the support [H, [5^ or more sophisticated support re- 
finement [3^ have not been discussed. Various projec- 
tion algorithms combined with some form of threshold 
have produced remarkable reconstructions of single iso- 
lated objects (HIO and RAAR [30], DM [27^), as well as 
single and powder crystals (SF and HIO with support 
refinement @, 0, |^ ) , but a full comparison of the al- 
gorithms behavior applied to this type of constraint have 
not been discussed. 
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FIG. 9: Percentage of successful reconstructions over many 
tests starting from random phases as a function of number of 
iterations. The support is the only constraint. Positivity and 
reality are not enforced, and the support is loose: it is larger 
than the object by one additional row and column 



TABLE II: Benchmark of various algorithms 



Algorithm 


No. of iterations for 
50% success 


success after 
10000 iterations 


HIO/HPR 


2790 


82% 


HIO/HPR+ER 


2379 


82.6% 


ASR 


1697^ 


42% 


S02D 


656 


100% 


S04D 


605 


100% 


Others 


> 10000 


0% 



M2% success, the algorithm either reconstruct in a limited number 
of iterations or never 
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